Long-Term Exploration in Persistent MDPs
نویسندگان
چکیده
Exploration is an essential part of reinforcement learning, which restricts the quality learned policy. Hard-exploration environments are defined by huge state space and sparse rewards. In such conditions, exhaustive exploration environment often impossible, successful training agent requires a lot interaction steps. this paper, we propose method called Rollback-Explore (RbExplore), utilizes concept persistent Markov decision process, in agents during can roll back to visited states. We test our algorithm hard-exploration Prince Persia game, without rewards domain knowledge. At all used levels outperforms or shows comparable results with state-of-the-art curiosity methods knowledge-based intrinsic motivation: ICM RND. An implementation RbExplore be found at https://github.com/cds-mipt/RbExplore.
منابع مشابه
Long-term persistent fetomaternal hemorrhage
It is not clear that how long the affected fetuses can tolerate fetomaternal hemorrhage (FMH). Incidental serial measurements of the fetal peak systolic velocity of the middle cerebral artery and the retrospective analysis of stocked blood available incidentally indicated that our patient had suffered from FMH for at least 2 weeks prior to delivery.
متن کاملExploration-Exploitation in MDPs with Options
While a large body of empirical results show that temporally-extended actions and options may significantly affect the learning performance of an agent, the theoretical understanding of how and when options can be beneficial in online reinforcement learning is relatively limited. In this paper, we derive an upper and lower bound on the regret of a variant of UCRL using options. While we first a...
متن کاملPlants in Long Term Lunar Exploration
IntroductionScience and Exploration Rational: Plant are complex eukaryotic organisms that share fundamental metabolic and genetic processes with humans and all higher organisms, yet their sessile nature requires that plants deal with their environment by adaptation in situ. Thus plants have evolved to deal with environmental change and stress by responding with changes in metabolism in order to...
متن کاملAutonomous Exploration For Navigating In MDPs
While intrinsically motivated learning agents hold considerable promise to overcome limitations of more supervised learning systems, quantitative evaluation and theoretical analysis of such agents are difficult. We propose to consider a restricted setting for autonomous learning where systematic evaluation of learning performance is possible. In this setting the agent needs to learn to navigate...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Lecture Notes in Computer Science
سال: 2021
ISSN: ['1611-3349', '0302-9743']
DOI: https://doi.org/10.1007/978-3-030-89817-5_8